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Chief Reader Report on Student Responses: 
2017 AP® Statistics Free-Response Questions 


Number of Students Scored 215,840 


Number of Readers 859 


Score Distribution Exam Score 


Global Mean 


The following comments on the 2017 free-response questions for AP® Statistics were written by the Chief 
Reader Jessica Utts, University of California, Irvine. They give an overview of each free-response question 
and of how students performed on the question, including typical student errors. General comments 
regarding the skills and content that students frequently have the most problems with are included. Some 
suggestions for improving student preparation in these areas are also provided. Teachers are encouraged to 
attend a College Board workshop to learn strategies for improving student performance in specific areas. 
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Question #1 Max. Points: 4 Mean Score: 1.72 
What were responses expected to demonstrate in their response to this question? 


The primary goals of this question were to assess a student's ability to (1) explain statistical terms used when 
describing the relationship between two variables; (2) interpret the slope of a linear regression equation; and (3) 
calculate a value of y when given a regression equation, a value of x, and a residual. 


How well did the responses address the course content related to this question? How well did the 
responses integrate the skills required on this question? 


e In part (a), responses did an excellent job of explaining what is meant by a positive relationship and 
including the context. Responses did a moderate job of explaining a strong relationship. Many had trouble 
explaining what is meant by a linear relationship without using circular reasoning words like “line” and 
“linear.” 

e Part (b) required responses to apply a straight-forward definition of slope to a particular context. They were 
able to do so, but some neglected to explain that the relationship is not exact. 

e Part (c) required use of standard formulas for predicted values and residuals in a non-standard way, and 
some responses only were able to partially complete the exercise by finding the predicted value, but 
neglected to carry out the final step of using the residual. 


What common student misconceptions or gaps in knowledge were seen in the responses to this 


question? 





Common Misconceptions/Knowledge Gaps 


Responses that Demonstrate Understanding 





e Defining a positive relationship by simply 
saying that there is a positive correlation. 


e A positive relationship means that wolves 
with higher values of length also tend to 
have higher weights. 





e In defining a linear relationship responses 
failed to link a change in y toa change 
in xX. 


e Not clearly indicating that a linear 
relationship has a constant rate of change 
in the response variable (weight) as the 
explanatory variable (length) inceases. 


e A linear relationship means that when 
length increases by one meter, weight 
tends to change by a constant amount, on 
average. 


e For any change in length the rate of 
change in weight is the same. 





e Using “correlation” to define a linear 
relationship. In most cases it was not clear 
if “correlation” was used in statistical 
manner or if was merely used as substitute 
for “relationship.” A correlation coefficient 
is a measure of the strength of a linear 
relationship, but it does not by itself 
explain the meaning of a linear 
relationship. It is more appropriate to use 
correlation to discuss a strong relationship. 








e A linear relationship means that when 
length increases by one meter, weight 
tends to change by a constant amount, on 
average. A strong relationship means that 
there is a high correlation (close to 1) 
between length and weight. 
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Indicating that a relationship is strong 
when points in the scatterplot are close 
together or not too scattered. The response 
should indicate that a relationship is 
strong when the points in the scatterplot 
are close to a line, or more generally, a 
curve. 


e A strong relationship means that the points 
are close to the least squares regression 
line. 





A graph presented without some 
explanation may not be provide enough 
information for the terms in part (a). 
Including graphs in explanations, when 
relevant, often helps to strengthen and 
clarify a response. However, in the 
explanation of positive, linear, or strong, a 
single graph was generally not acceptable 
if it was not accompanied by some written 
communication or a second graph. The 
score for a graph-based response with no 
written communication, or no useful 
written communication, was scored as 
acceptable only if it had a pair of graphs 
with one illustrating the attribute and the 
other illustrating what the attribute is not. 


e The following graphs would illustrate what 
is meant by “linear” by offering a 
comparison of linear and not linear. 


——— 


; ; v 
Linear not Vinee 





Implying that the slope of a least squares 
regression line corresponds to an exact 
relationship between changes in observed 
values of y as x changes. 


Examples of acceptable responses are: 

e The predicted weight increases by 
35.02 kg for each 1-meter increase in 
length. 

e Weight increases by 35.02 kg for each 
1-meter increase in length, on average. 





Failure to link the increase in the predicted 
response to an increase of a specific size in 
the explanatory variable. For instance, an 
unacceptable response is “For any change 
in length, the predicted weight increases 
by 35.02 kg.” 


e The predicted weight increases by 
35.02 kg for each 1-meter increase in 
length. 








In calculating the actual weight from the 
regression equation for a specific x when 
a residual is given, many responses 
stopped after computing the predicted 
value. 


Some responses incorrectly replaced the 
residual with the intercept in the 
calculation of the actual response. 





e Predicted weight 
= -16.46 + 35.02(1.4) = 32.568 kg 


e Actual weight = 32.568 + (-9.67) = 22.9 kg 


e Predicted weight 
= -16.46 + 35.02(1.4) = 32.568 kg 


e Actual weight = 32.568 + (-9.67) = 22.9 kg 
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Based on your experience at the AP® Reading with student responses, what advice would you 
offer to teachers to help them improve the student performance on the exam? 


e Some students found it easier to explain the concepts of positive, linear, and strong relationships by 
formulating responses in the framework of points on the scatterplot described in part (a) instead of using 
abstract explanations. Responses tended to be less precise when students tried to give more abstract 
explanations that were based on potential patterns of points in scatter plots. Responses to part (a) also 
tended to be more off-target when students based them on regression models. There was no mention of 
any model in part (a), only a description of a scatterplot. Perhaps students were motivated to think about 
regression models when they read parts (b) and (c) of the question which were based on a regression 
equation. While it is good practice for students to read the entire question before they begin to any answer 
parts of the question, they should focus on answering each individual part without pulling in ideas from 
later parts of the question. 

e An explanation of a linear relationship should indicate that is the rate of change in y is the same for any 
change in x. Examples of inappropriate statements are: 

o Change in y is the same for any change in x. (This implies that a 1-unit increase in x is associated 
with the same change in y as a 2-unit increase in x.) 

o For any x, the change in y is the same. (This does not relate a change in y to a change in x.) 

o x and y change at the same rate. (This implies a line with slope equal to 1, which is to restricted and 
does not define all lines.) 

o y is directly proportional to x. This is too restrictive because it implies that y must be 0 when x is 0. 
(A square is a rectangle but the definition of a square is not a good definition of a rectangle.) Students 
should avoid the use of “proportional,” “directly proportional,” and “constant ratio” unless they are 
comparing rates of change for two variables. 

e Use of the word correlation. It is often difficult to determine if the response used correlation as a substitute for 
relationship in the English (instead of statistical) meaning of the word. It is better to use correlation coefficient to 
make it clear that correlation is being used in a statistical sense. 

e “There is a positive correlation” does not provide an explanation of a positive relationship, nor does it provide an 
explanation of a linear relationship. A correlation coefficient is a measure of the strength of a linear relationship. 

e Inusing the concept of a correlation coefficient to describe that a linear relationship is strong, it is good practice 
to provide a range of numerical values to quantify what is strong; for example, a value of a correlation coefficient 
between 0.7 and 1. Using numerical values also clarifies that correlation is being used in a statistical context. 

e An interpretation of a slope of a regression line should relate a specific change in the predicted response to a 
specific change in the explanatory variable. A correct interpretation is “for each 1-meter increase in length, the 
weight of wolves is predicted to increase by 35.02 kg.” An incorrect interpretation is “for any increase in 
length, the weight of wolves is predicted to increase by 
35.02 kg.” 

e If asked for a value of a response in a regression problem, use the formula for the least squares regression line to 
compute the predicted response. If the question gives a value of a residual, use it to compute the actual value of 
the response from the predicted value. 








What resources would you recommend to teachers to better prepare their students for the 
content and skill(s) required on this question? 


In general, review of previous Chief Reader reports will give teachers excellent insight into what constitutes strong 
statistical reasoning, as well as common student errors and how to address them in the classroom. Previously 
released exam questions and participation in the Online Teacher Community are of high value for practice and for 
advice. 
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Question #2 Max. Points: 4 Mean Score: 2.22 
What were responses expected to demonstrate in their response to this question? 


The primary goals of this question were to assess a student's ability to (1) construct and interpret a confidence 
interval for a population proportion and (2) use a confidence interval for a proportion to find a confidence interval 
for a dollar amount that can be calculated using that proportion. 


How well did the responses address the course content related to this question? How well did the 
responses integrate the skills required on this question? 


e Part (a) is a standard inference question; to construct and interpret a confidence interval. It addresses 
specific course content and skills, and responses generally demonstrate good use of those skills, with some 
common errors. 

e Part (b) required an extension of standard content to demonstrate statistical reasoning skills. Responses 
generally did a nice job of demonstrating this skill, with the most common errors related to not reading the 
question carefully. 


What common student misconceptions or gaps in knowledge were seen in the responses to this 
question? 











Common Misconceptions/Knowledge Gaps Responses that Demonstrate Understanding 
e Identification of procedure missing or e The appropriate procedure is a one-sample 
incorrect. z-interval for a population proportion. 
One or more of the following errors related to e Correct conditions: 
the conditions: 1. Random sample 
2. Large sample (number of successes 
e Not checking conditions at all. np = 10 and number of failures 
e Omitting the large sample condition n(1— p) = 10) 
(np 2 10 and ng 2 10) or verifying only e For condition 1, the stem of the problem 
one of the two inequalities. states that a random sample of customers 
e Mislabeling conditions, such as who asked for a water cup was used. 
“Independence” for np = 10 and e For condition 2, the number of 
ng > 10. “successes” (filled cup with soft drink) is 
e Stating as a condition that the sample or =o and pneu DeL Ol Talldtes is 37 Dov 
population has a normal distribution (for a SE eee eater 
categorical variable), or vague reference to 
a normal distribution. 
e Inappropriate large sample condition: 
n= 30. 
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e § §©Errors in mechanics: 


e Using incorrect critical value (wrong z* or 
elie). 

e Showing df; using t instead of z. 

e Using an incorrect formula for the 
standard error of a sample proportion. 


e The correct confidence interval is: 


hes WEALD) 
+ 
[hes ere 


which is 


0.2875(1 - 0.2875) 


0.2875 + 1.96 
80 





The interval is 0.1883 to 0.3867. 





e Calculating an unreasonable interval for a 
proportion—not realizing that a proportion 
must be between 0 and 1. 


e ©The interval is 0.1883 to 0.3867. 





e Not recognizing that the population of 
interest is all customers who asked for a 
water cup at this fast-food restaurant. 


e When defining the parameter, state that 
“The population is all customers of this 
restaurant who ask for a water cup, and p 
is the proportion of that population that will 
fill the cup with a soft drink.” 





e Interpreting confidence level instead of 
confidence interval. 


e Wecan be 95% confident that in the 
population of all customers at this fast-food 
restaurant who ask for a water cup, the 
proportion that will fill it with a soft drink 
is between 0.1883 and 0.3867. 





Errors in part (b): 


e Calculating a single value (point estimate) 
rather than an interval. 

e Not using the interval from part (a) as 
directed. 

e Not showing work. 


e Using the confidence interval in Part (a), a 
95% interval estimate for the number of 
customers in June who asked for a water 
cup but then filled it with a soft drink is 
3,000 x 0.1883 to 3,000 x 0.3867, or 565 to 
1,160. At a cost of $0.25 per customer, a 
95% interval estimate for the cost to the 
restaurant in June is $141.25 to $290.00. 














Based on your experience at the AP® Reading with student responses, what advice would you 
offer to teachers to help them improve the student performance on the exam? 


e Ininference questions, ask students to identify the population and parameter of interest. Encourage 
students to use the language in the stem of the question when defining the parameter. 

e Discuss why each condition is being checked for an inference procedure and help students understand 
how to check the condition. Use applets and hands-on activities to demonstrate what happens when each 
condition isn’t met. 

e Insist on proper notation throughout the course and refer students to the formula sheet. 

e Emphasize the difference between interpreting a confidence interval and a confidence level. Use hands-on 
activities and applets involving repeated sampling to illustrate the idea of a confidence level interpretation. 
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Give students ample practice in distinguishing categorical variables/data (proportions) from quantitative 


variables/data (means). 
Have students make a summary chart of inference procedures with the appropriate names, conditions, and 


formulas for each. 


What resources would you recommend to teachers to better prepare their students for the 
content and skill(s) required on this question? 


In general, review of previous Chief Reader reports will give teachers excellent insight into what constitutes strong 
statistical reasoning, as well as common student errors and how to address them in the classroom. Previously 
released exam questions and participation in the Online Teacher Community are of high value for practice and for 


advice. 
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Question #3 


Max. Points: 4 


Mean Score: 1.72 


What were responses expected to demonstrate in their response to this question? 


The primary goals of this question were to assess a student’s ability to (1) calculate a probability from a normal 
distribution; (2) calculate a weighted probability from two individual probabilities; and (3) calculate a conditional 


probability for dependent events when individual and joint probabilities are provided. 


How well did the responses address the course content related to this question? How well did the 


responses integrate the skills required on this question? 


e Most responses were able to calculate the normal probability, but made some errors in 


notation and in showing work. 


e Many responses had trouble calculating the weighted probability because they confused 
independent and dependent events, and/or probability rules for independent events. 
e Many responses confused conditional probabilities with joint probabilities. 


What common student misconceptions or gaps in knowledge were seen in the responses to this 


question? 


Common Misconceptions/Knowledge Gaps 


Responses that Demonstrate Understanding 








normal probability. 


S UY = Wes 
e Use of the t-distribution to solve a normal BCS WSs P(z a 5 
probability. e = P(Z > 0.8) 
= 0.2119 
e Useof x, asin P(x > 137), tosolvea e P(X > 187) 





e Use of calculator-speak normalcdf(137, 
1000000, 133, 5) withouth identifying 
parameters and boundary conditions. 


e For notation normalcdf(137, 1000000, 133, 5), 
137 is the lower bound, 1,000,000 is the upper 
bound, 133 is w andbis o 





e Generally, not knowing how much work is 
needed to justify normal probability 
calculations. 








IAQ > 137) = P(z > rs) 


e = P(Z > 08) 
= 0.2119 


e OR shown in a figure, as follows: 


\31 »” 
S/O» 
1/7 
ALE GN 
rl Lam 


' Sek 
Wet em en ° 
wow ud wd 
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Not knowing the difference between e Pi Us 137 - } not Pz > 137 - x 
parameters and statistics. 9 2 
anes oe 137 - 133 
Thinking the Empirical Rule can yield AX S 187) = P(z > we) 

normal probabilities for values other than 
1, 2, and 3 standard deviations on either : = P(Z > 0.8) 
side of the mean. = 0.2119 
ee . IS = 18) 
Thinking that 137 can be adjusted as if POS > 187) = P(z > ed 
the normal random variable is discrete 5 
and using 137.5 or 138 in place of 137. © = P(Z > 08) 
—aOPalul) 





Not recognizing that two events are 
mutually exclusive. 


P(G) = P(Gand J) + P(Gand K) 
e = 0.1483 + 0.2524 
= 0.4007 





Generally not knowing how much work is 
needed to justify probability calculations. 


Not being able to properly create a tree 
diagram. 


P(G)i= PG ix PG) + eG kK) P(X) 
e = (0.2119)(0.7) + (0.8413)(0.3) 
= 0.1483 + 0.2524 = 0.4007 


(0.7)(0.2119)=0.1483 


G 
0.2119 
0.7881 
not (0.7)(0.7881)=0.5517 
0.7 
0.3 
. ‘ (0.3)(.8413)=0.2524 
0.8413 
0.1587 
not G 


(0.3)(0.1587)=0.0476 





Not being able to find the appropriate 
probabilities from a tree diagram. For 
example, thinking the conditional branch 
on the tree is actually the intersection 
probability. 


P(G) = P(G| J) x P(J) + P(G| K) x P(K) 
° = (0.2119)(0.7) + (0.8413)(0.3) 
= 0.1483 + 0.2524 = 0.4007 








Not being able to correctly use the 
probability of a given event in computing 
a conditional probability, omitting that 
probability, or switching it with a 
different probability. 





P(G) = P(G| J) x P(J) + P(G| K) x P(K) 
@ = (0.2119)(0.7) + (0.8413)(0.3) 
= 0.1483 + 0.2524 = 0.4007 
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PJ and G)  P(G|J)PU) 














e Not being able to distinguish between a P(J|G@)= = 
question that is asking for a conditional P(G) P(G) 
probability and one that is asking for an - _ (0.2119)(0.7) 2 0.1483 
intersection probability; such as, ignoring 0.4007 0.4007 
the “given” in a question. = 0.3701 

e Assuming two events are independent P(J|G) = PJ and G) _ P (G | J)PJ) 
when they are not. P(G) P(G) 

s = (0.2119)(0.7) _ 0.1483 





0.4007 ~—- 0.4007 
= 0.3701 





(0.7)(answer from part (a)) 


e ©Not being able to recognize that a e 
(answer from part (b)) 


calculation from earlier work could be 
used in subsequent calculations. 














Based on your experience at the AP® Reading with student responses, what advice would you 
offer to teachers to help them improve the student performance on the exam? 


e Insolving problems, model what the students should do by showing all work/steps in a probability problem 
(and inference). 

e When teaching the Empirical Rule, be sure to relate calculated normal probabilities to the rule. Emphasize 
that the Empirical Rule gives only approximations of normal values and only for 1, 2, and 3 standard 
deviations from the mean; any sort of interpolation will give incorrect answers. 

e When introducing the t-distribution, emphasize that the only time the z-distribution and 
t-distribution are the same is when the t-distribution is based on an infinite number of degrees of freedom, 
which is never going to happen. A corollary is that any normal probability problem should be solved using 
the z-distribution, not the t-distribution. 

In general, it is not a good idea to use “calculator speak” in answering any question. 

e Show the students problems where there are multiple parts and the answers for the later parts depend 
upon the results for the earlier parts. 

e If any continuous approximation of a discrete random variable is taught, explain the reason for any 
adjustment or continuity correction. Hopefully, this will decrease the probability that the student will 
attempt to use a correction for a continuous distribution. 

e §6After introducing the sampling distribution for the sample mean, go through an example that starts with 
the assumption of normality, calculates a standard normal probability for a single value of X, then fora 
value of the sample mean. Explain that technically it is possible (under the assumption of a normal 
population) to calculate the probability for a single value by using the sample mean based on a sample of 
size one, but this approach involves more work and could lead to errors later when a sample is not taken 
from a normal population. Then continue the example where the assumption of a normal population is not 
valid (best if one sample size is below 30 and the other is above 30). 

e Students have trouble with “independence” versus “mutually exclusive.” Early on, give an example where 
there are three events, two are mutually exclusive, two are independent, and two are neither. For example, 
toss three coins and note whether each is a heads or tails. Let event A represent getting at least two heads, 
event B represent getting exactly two heads, and event C represent getting all heads or all tails. A and B are 
neither independent nor mutually exclusive, A and C are independent (not obviously, so it helps to reinforce 
that independence is based on probability, not appearances), and B and C’ are mutually exclusive. 
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What resources would you recommend to teachers to better prepare their students for the 
content and skill(s) required on this question? 


In general, review of previous Chief Reader reports will give teachers excellent insight into what constitutes strong 
statistical reasoning, as well as common student errors and how to address them in the classroom. Previously 
released exam questions and participation in the Online Teacher Community are of high value for practice and for 


advice. 
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Question #4 Max. Points: 4 


Mean Score: 1.71 


What were responses expected to demonstrate in their response to this question? 


The primary goals of this question were to assess a student's ability to use boxplots to (1) compare multiple sets of 
data; (2) identify which set of data is most likely to have produced a particular summary value; and (3) determine 


which variable is most useful for classifying a new observation. 


How well did the responses address the course content related to this question? How well did the 


responses integrate the skills required on this question? 


e Responses generally indicated an understanding of what is displayed in a boxplot. 


e Responses had problems with comparing boxplots to answer specific questions because they tended not to 
give an actual comparison. Responses understood how to choose which one was best for a given purpose, 


but did not specify why the others were not as good. 


What common student misconceptions or gaps in knowledge were seen in the responses to this 


question? 





Common Misconceptions/Knowledge Gaps 


Responses that Demonstrate Understanding 





e Stating that symmetric boxplots 
indicate that the distribution is normal. 


Complete shape information cannot be 
determined from a boxplot. 





e Students were asked to describe a 
similarity and a difference for 
chemical Z across the three sites. Some 
students used only a location, such as 
the maximums or the minimums. 


The median value for the percent of 
chemical Z in the pottery pieces is similar 
for all three sites, at about 7%. The ranges 
for the percent of chemical Z are much 
different for the three sites, with the 
smallest range being about 2% (from 6% to 
8%) at site II, a much higher range of 
about 6% (from about 4% to 10%) at site I 
and the largest range of about 8% (from 
about 3% to 11%) at site III. 





e Using the term range incorrectly — many 
responses referred to range as an 
interval instead of a number. 








The ranges for the percent of chemical Z 
are much different for the three sites, with 
the smallest range being about 2% (from 
6% to 8%) at site II, a much higher range 
of about 6% (from about 4% to 10%) at site 
I and the largest range of about 8% (from 
about 3% to 11%) at site III. 
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Some responses described many 
attributes of the boxplots for chemical Z 
at each site but never clearly stated 
what is similar and what is different. 
This is a “laundry list,” nota 
comparison. 


The median value for the percent of 
chemical Z in the pottery pieces is similar 
for all three sites, at about 7%. The ranges 
for the percent of chemical Z are much 
different for the three sites, with the 
smallest range being about 2% (from 6% to 
8%) at site II, a much higher range of 
about 6% (from about 4% to 10%) at site I 
and the largest range of about 8% (from 
about 3% to 11%) at site II. 





For part (b-i), many responses selected 
site ITI based on the sums of the 
medians instead of the sums of the 
minimums and the sums of the 
maximums. 


The piece most likely originated at site III. 
Although values outside of the range of 
data observed in the samples would be 
possible, using the available data results 
in approximate minimum and maximum 
sums of the percents for the three 
chemicals, as shown in the table below. 
The only site that includes 20.5 between 
the sums of the minimum and maximum 
values is site III. [Response then includes 
a table showing the sum of minimum and 
maximums for each site. ] 





Many responses correctly selected 
site ITI but did not state why sites I and 
Il were not the best choices. 


The only site that includes 20.5 between 
the sums of the minimum and maximum 
values is site III. 








In part (b-ii), many responses had 
difficulty clearly explaining that the 
boxplots do not overlap. In order to do 
this, both a difference in location and 
small variability needed to be 
addressed. Some responses described 
the boxplots as having different means, 
different variability, or simply said that 
they have different boxplots. None of 
these descriptors indicate no overlap. 





Chemical Y would be most useful, 
because the distribution of the 
percentages of total weights at the three 
sites do not overlap. The distributions of 
chemicals X and Z have substantial 
overlap. 
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Based on your experience at the AP® Reading with student responses, what advice would you 
offer to teachers to help them improve the student performance on the exam? 


¢ Read the question carefully and answer (only) the question asked. If the question asks for a similarity and a 
difference, clearly organize and state what is similar and what is different. 
¢ When asked to identify one choice among three, complete justification includes reasoning for that particular 


choice, as well as rationales for not choosing the other options. 
¢ Use clear communication within each part of the question. Do not assume that the reader will look back. 


Instead, incorporate previous work into the answer. 


What resources would you recommend to teachers to better prepare their students for the 
content and skill(s) required on this question? 


In general, review of previous Chief Reader reports will give teachers excellent insight into what constitutes strong 
statistical reasoning, as well as common student errors and how to address them in the classroom. Previously 
released exam questions and participation in the Online Teacher Community are of high value for practice and for 


advice. 
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Question #5 Max. Points: 4 Mean Score: 1.51 
What were responses expected to demonstrate in their response to this question? 


The primary goal of this question was to assess a student’s ability to identify, set up, perform, and interpret the 
results of an appropriate hypothesis test to address a particular question. More specific goals were to assess a 
student’s ability to (1) state appropriate hypotheses; (2) identify the appropriate statistical test procedure and 
check appropriate conditions for inference; (3) calculate the appropriate test statistic and p-value; and (4) draw 
an appropriate conclusion, with justification, in the context of the study. 


How well did the responses address the course content related to this question? How well did the 
responses integrate the skills required on this question? 


e Responses did a good job of identifying hypotheses, stating a decision in terms of the alternative 
hypothesis, and making a conclusion in context. 


e Many responses included mistakes in the details naming the test and carrying out the mechanics. 


What common student misconceptions or gaps in knowledge were seen in the responses to this 
question? 





Common Misconceptions/Knowledge Gaps Responses that Demonstrate Understanding 





e Responses incorrectly used the idea of e Hy): Age group at diagnosis and gender 
sufficient evidence (given in the stem of 


are independent (that is, they are not 
the problem) to state their hypotheses: 


associated) for the population of people 
currently being treated for schizophrenia. 
H, : Age group at diagnosis and gender 
are not independent for the population of 
Fas : There is sufficient evidence of an people currently being treated for 
association. schizophrenia. 


Hp : There is sufficient evidence of no 
association. 








e Some responses had trouble identifying |e The expected counts for all 8 cells of the 
the correct conditions. Problems table are at least 5, as seen in the following 
included: table, with expected counts shown below 

observed counts: 

o Listing incorrect conditions such as 
n> 30, or “both samples 











independent.” 20to | 30to | 40to | 50to Total 
o Stating the condition that expected 29 39 49 59 
counts are > 5, but not verifying it by ee lie ee oe Ze ie oe 
; ! BSS || W272 || a2 8.62 
computing them. Men 53 23 9 2) 88 
o Stating that the expected count 42.09 | 26.78 | 12.75 | 6.38 


























condition is required for normality. 
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Some responses showed incorrect work x’ = 2.093 + 0.395 + 0.817 + 1.322 
for the test statistics and/or the p-value. 


° +2.830 + 0.534 + 1.105 + 1.788 
= 10.884 


e The p-value is Ply? > 10.884) = 0.012, 
based on (4 — 1)(2 — 1) = 3 degrees of 





freedom. 
e Some responses wrote statistical e Because the p-value is very small (for 
conclusions as definitive statements instance much smaller than a = 0.05), we 
“we conclude ...” or “we prove.” would reject the null hypothesis and 


conclude that the sample data provide 
strong evidence that there is an association 
between age group at diagnosis and 
gender for the population currently being 
treated for schizophrenia. 





Some responses did not use appropriate Because the p-value is very small (for 
linkage between the p-value anda instance much smaller than a = 0.05), we 
stated level of alpha in making a would reject the null hypothesis... 
statistical decision. 








x’ = 2.093 + 01395 + 0.817 + 1.322 


Some responses used the bar graph . 42.830 + 0.534 +1.105 +1.788 


provided and stated conclusions about 


the sample data, but did not carry out = 10.884 

any inference. e The p-value is Ply? > 10.884) = 0.012, 
based on (4 — 1)(2 — 1) = 3 degrees of 
freedom. 











Based on your experience at the AP® Reading with student responses, what advice would you 
offer to teachers to help them improve the student performance on the exam? 


Remind students that the null hypothesis is about the population and not the sample. It may be helpful to 
point out that no association (independence) is synonymous with status quo/no change similar to a null 
hypothesis of {dy = Up. 

Remind students that it is important for them to name the test they are performing. Help students 
distinguish between chi-square test for independence and chi-square test for homogeneity. The distinction 
has to do with the sampling method, whether there was one sample taken or multiple samples taken. 

Help students understand not only the correct conditions needed for hypothesis tests, but also emphasize 
why those conditions are necessary. Remind students to well label their work and clearly communicate what 
they are doing. 

Emphasize to students how the values that they obtain from the calculator were calculated. Ideally, 
students should be familiar with how to calculate values by formula and calculator. 

Emphasize to students that statistical conclusions are not definitive and that there should be some level of 
uncertainty. 

Highlight the importance of good communication; students should state why they are making the 
decisions that they are making. 
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Emphasize to students the difference between evidence from a sample (which can be obtained from the 


e 
graph) and convincing statistical evidence (which can be obtained from a hypothesis test). 


What resources would you recommend to teachers to better prepare their students for the 
content and skill(s) required on this question? 


In general, review of previous Chief Reader reports will give teachers excellent insight into what constitutes strong 
statistical reasoning, as well as common student errors and how to address them in the classroom. Previously 
released exam questions and participation in the Online Teacher Community are of high value for practice and for 


advice. 
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Question #6 Max. Points: 4 Mean Score: 0.99 


What were responses expected to demonstrate in their response to this question? 


The primary goals of this question were to assess a student's ability to (1) calculate probabilities associated with 
treatment and control group memberships for two different methods of random assignment and (2) justify which 


random assignment method is more appropriate in a given situation. 


How well did the responses address the course content related to this question? How well did the 


responses integrate the skills required on this question? 


e Responses generally did a good job of calculating probabilities for the two methods, but did not show 


sufficient work to know how they were found. 


e Responses generally were able to identify the preferred randomization method in the context of the problem, 
but did not communicate well about the reason why it would be preferred in this situation. 


What common student misconceptions or gaps in knowledge were seen in the responses to this 


question? 





Common Misconceptions/Knowledge 
Gaps 


Responses that Demonstrate Understanding 





e Although many responses listed the 
correct probabilities in the table in (i) 
of parts (a) and (b), the justification 
was often missing or incomplete. 


P(Arangement A) = P(TT) 


: -(8)=4 





e In (ii) of parts (a) and (b), responses 
failed to include BOTH 
Arrangements A and D when asked 
for the probability that both men end 
up in the same treatment group. 


e Part (a-ii): P(A) + P(D) = 


e Part (b-ii): P(A) + P(D) = 





e In part (ii), when attempting to 
combine the probabilities of 
arrangements A and D, some 
responses incorrectly multiplied the 
two probabilities rather than adding 
them. 


e Part (a-ii): P(A) + P(D) = 


e Part (b-ii); P(A) + P(D) = 





e In part (ii), when attempting to 
combine the probabilities of 
arrangements A and D, some 
responses incorrectly subtracted 
P(A) x P(D) from P(A) + P(D). 








e Part (a-ii): P(A) + P(D) = 


Part (b-ii); P(A) + P(D) = 
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e Some responses included only a 
benefit of the chip method or only a 
drawback of the coin method, but 
not both. 


e Although many responses said it 
would be good to have equally likely 
arrangements or to avoid 
imbalanced treatment groups, 
almost no responses explained why 
this is important. 


The chip method gives equal probability to 
all possible arrangements, but the coin 
method does not, as shown in the tables 
from parts (a-i) and (b-i). Furthermore, the 
coin method is more likely to result in 
imbalanced treatment groups with regard 
to students and teachers, based on the 
probabilities in parts (a-ii) and (b-ii). 


If food preferences for teachers are 
different from those of students, an 
imbalance is a problem. For example, if 
one treatment group consists entirely of 
students, it would be impossible to know if 
a difference in the response variable is due 
to the treatment (type of meal) or the role 
of the person at the school (teacher or 
student). 





e Some responses never explicitly 
made a choice, even though they 
included correct comments about 
the chip and coin methods. 








Use the chip method. 





Based on your experience at the AP® Reading with student responses, what advice would you 


offer to teachers to help them improve the student performance on the exam? 





For any probability calculation, students should provide justification that is easy to follow. 

Make sure students read the question carefully, including information in the initial stem of the question. 
Make sure students know the difference between P(A or D) and P(A and D). 

When using the general addition rule, remind students that P(A and D) = 0 when events are mutually 
exclusive. 

When asked to make a choice between options, make sure students explain why they are choosing what 
they are choosing, and why they are not choosing what they are not choosing. 

Suggest that students ask “so what?” or “why does this matter?” at the end of an explanation, especially 
when choosing a data collection method. Correctly addressing why the choice matters is often the 
difference between a substantial response and a complete response. 

Make sure students answer the question. 


What resources would you recommend to teachers to better prepare their students for the 
content and skill(s) required on this question? 


In general, review of previous Chief Reader reports will give teachers excellent insight into what constitutes strong 
statistical reasoning, as well as common student errors and how to address them in the classroom. Previously 
released exam questions and participation in the Online Teacher Community are of high value for practice and for 
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